Improving fold recognition without folds.
نویسندگان
چکیده
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.
منابع مشابه
Improving Protein Fold Recognition by Deep Learning Networks
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and...
متن کاملImproving taxonomy-based protein fold recognition by using global and local features.
Fold recognition from amino acid sequences plays an important role in identifying protein structures and functions. The taxonomy-based method, which classifies a query protein into one of the known folds, has been shown very promising for protein fold recognition. However, extracting a set of highly discriminative features from amino acid sequences remains a challenging problem. To address this...
متن کاملClassification Accuracy and Model Selection in k-Nearest Neighbors Classifiers for Data Driven Learning
Pattern classification is a core research area and a main task in pattern recognition. A classifier induced by machine learning algorithms maps an unlabeled instance to a label using internal data structures. In this paper we experiment first by changing the k value of nearest neighbors from 3 to 15 and compare the accuracy of two classifiers on various training and test sets. The results show ...
متن کاملHelix-bundle membrane protein fold templates.
In the fold recognition approach to structure prediction, a sequence is tested for compatibility with an already known fold. For membrane proteins, however, few folds have been determined experimentally. Here the feasibility of computing the vast majority of likely membrane protein folds is tested. The results indicate that conformation space can be effectively sampled for small numbers of heli...
متن کاملMulti-class protein fold recognition using support vector machines and neural networks
MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of molecular biology
دوره 341 1 شماره
صفحات -
تاریخ انتشار 2004